Skip to content

Conversation

@forsyth2
Copy link
Collaborator

@forsyth2 forsyth2 commented Jun 24, 2025

Replacement of #46, to resolve #43.

@forsyth2 forsyth2 self-assigned this Jun 24, 2025
@forsyth2 forsyth2 added the documentation Files in `docs` modified label Jun 24, 2025
@forsyth2 forsyth2 mentioned this pull request Jun 25, 2025
Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@golaz This is ready for an initial review, but more work still needs to be done. Please see my questions/comments made as part of this self-review. Thanks!

The three goals of the epic are:

  1. Centralize v1 data on HPSS archive
  • Almost all v1 data is now on NERSC HPSS under /home/projects/e3sm/www/WaterCycle/E3SMv1/
  • At this point, I only have one simulation left, which is the 307 TB "HR v1 1950 control (56-135)" listed on Confluence. Back-of-the-envelope calculations showed a hsi cp would take over 2 days, but hsi mv would be instantaneous.
    • @ndkeen if you don't need the data to be in /home/n/ndk/2019/theta.20190910.branch_noCNT.n825def.unc06.A_WCYCL1950S_CMIP6_HR.ne120_oRRS18v3_ICG, can you please move it to /home/projects/e3sm/www/WaterCycle/E3SMv1/HR/theta.20190910.branch_noCNT.n825def.unc06.A_WCYCL1950S_CMIP6_HR.ne120_oRRS18v3_ICG?
  1. Add v1 documentation page to e3sm_data_docs
  1. Update ESGF links for native output
  • What are the correct links/templates/patterns to use?


Experiments:

The datasets include the following experiments:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This covers LR, what about HR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,111 @@
**********************************
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,52 @@
model_version, group, resolution, category, simulation_name, machine, checksum, experiment, ensemble_num, link_type, node,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I recall, for the v2 simulations, each simulation had a Confluence page that listed the values to use for checksum, but I don't know if v1 had that too.
  • It was a little easier to deduce the experiment & ensemble_num for LR simulations. What should the HR simulation experiments be?
  • What should the link_type (CMIP only, naitve, both?) be for these simulations?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we had checksums. The machine is long gone, so we cannot reproduce these simulations anyway.

Comment on lines 31 to 35
v1, WaterCycle, HR, DECK, 20211021-maint-1.0-tro.A_WCYCLSSP585_CMIP6_HR.ne120_oRRS18v3_ICG.unc12-3rd-attempt, ???, , ssp5_8.5, 1, , ,
v1, WaterCycle, HR, DECK, 20200517-maint-1.0-tro.A_WCYCL20TRS_CMIP6_HR.ne120_oRRS18v3_ICG.unc11, ???, , , , , ,
v1, WaterCycle, HR, DECK, 202101027-maint-1.0-tro.A_WCYCL20TRS_CMIP6_HR.ne120_oRRS18v3_ICG.unc12, ???, , , , , ,
v1, WaterCycle, HR, DECK, theta.20190910.branch_noCNT.n825def.unc06.A_WCYCL1950S_CMIP6_HR.ne120_oRRS18v3_ICG, , theta, , , , ,
v1, WaterCycle, HR, DECK, 20210112.A_WCYCL1950S_CMIP6_HR.ne120_oRRS18v3_ICG.unc06, ???, , , , , ,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 4 simulations, I couldn't deduce what machine they were run on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know either. Maybe that's not important.


Scripts are not available to reproduce v1 simulations.

Original run scripts (the scripts that were originally used to create the simulations) have been archived here `here <https://github.com/E3SM-Project/e3sm_data_docs/tree/main/run_scripts/v1/original/>`_. These latter scripts are provided for reference only.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming we should add original scripts, but I can't seem to find them.

Example:

> zstash ls --hpss=/home/projects/e3sm/www/WaterCycle/E3SMv1/HR/cori-knl.20190214_maint-1.0.F2010C5-CMIP6-HR.ARE.nudgeUV.1850aero.ne120_oRRS18v3 *.sh
For help, please see https://e3sm-project.github.io/zstash. Ask questions at https://github.com/E3SM-Project/zstash/discussions/categories/q-a.
build/ice/source/core_atmosphere/physics/checkout_data_files.sh
build/ice/source/core_ocean/get_cvmix.sh
build/ocn/source/core_atmosphere/physics/checkout_data_files.sh
build/ocn/source/core_ocean/get_BGC.sh
build/ocn/source/core_ocean/get_cvmix.sh
build/ocn/source/core_ocean/.cvmix_all/reg_tests/Bryan-Lewis/BL_test.sh
build/ocn/source/core_ocean/.cvmix_all/reg_tests/common/build.sh
build/ocn/source/core_ocean/.cvmix_all/reg_tests/common/check_inputdata.sh
build/ocn/source/core_ocean/.cvmix_all/reg_tests/common/environ.sh
build/ocn/source/core_ocean/.cvmix_all/reg_tests/common/parse_inputs.sh
build/ocn/source/core_ocean/.cvmix_all/reg_tests/common/run.sh
build/ocn/source/core_ocean/.cvmix_all/reg_tests/common/usage.sh
build/ocn/source/core_ocean/.cvmix_all/reg_tests/double_diff/double_diff-test.sh
build/ocn/source/core_ocean/.cvmix_all/reg_tests/kpp/kpp-test.sh
build/ocn/source/core_ocean/.cvmix_all/reg_tests/shear/shear-test.sh
build/ocn/source/core_ocean/.cvmix_all/reg_tests/tidal-Simmons/Simmons-test.sh
case_scripts/.env_mach_specific.sh
test01/case_scripts/.env_mach_specific.sh

None of these look like a production run script...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we list the scripts on Confluence?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally found the v2 simulation Confluence pages, e.g., v2.LR.piControl but I'm not seeing clear equivalents anywhere from v1. In any case, that page links to a run script, and from there it looks like there may be some v1 run scripts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run scripts might be in provenance. Put in the index.rst search for case_scripts/run_script_provenance/, they can use zstash extract to get those files. We don't have to worry about adding run scripts to this repo.

@forsyth2 forsyth2 requested a review from golaz June 25, 2025 20:40
@forsyth2 forsyth2 marked this pull request as ready for review June 25, 2025 20:40
Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rendering: https://portal.nersc.gov/cfs/e3sm/forsyth/data_docs_59/html/v1/WaterCycle/simulation_data/simulation_table.html

Most importantly, we need correct ESGF links. Other than that, it would be nice to categorize/describe the simulations a little better.

I added what run scripts I could find, but not all of them appear to be available anywhere.

Comment on lines 52 to 64
TODO: Find remaining original run scripts
TODO: Add descriptions for added LR simulations above
TODO: Correctly categorize HR simulations
TODO: Determine correct CMIP/Native Links
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining TODOs

Copy link
Collaborator Author

@forsyth2 forsyth2 Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also change group to e3sm in /home/projects/e3sm/www/WaterCycle/E3SMv1/. If @ndkeen can move the remaining simulation then we'll also need the e3sm group for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running chgrp -R e3sm /home/projects/e3sm/www/WaterCycle/E3SMv1/

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys, stop tagging me, pls. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that! Thanks @rljacob for updating the references to the correct tag!

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from meeting with @golaz

  • LRtunedHR -- LR with HR parameters, leave them under HR. Add "LRtunedHR" group for simulations that contain that substring
  • nudgeUV doesn't need further categorization, U (W/E) V (N/S) are wind directions.


* DAMIP

* damip_hist-GHG 3 ensembles
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GHG only

Comment on lines 55 to 56
* ssp5-8.5 5 ensembles
* damip_ssp5-8.5-GHG 3 ensembles
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future projection
future projection with GHG only

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ensemble members" not ensembles


Scripts are not available to reproduce v1 simulations.

Original run scripts (the scripts that were originally used to create the simulations) have been archived here `here <https://github.com/E3SM-Project/e3sm_data_docs/tree/main/run_scripts/v1/original/>`_. These latter scripts are provided for reference only.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run scripts might be in provenance. Put in the index.rst search for case_scripts/run_script_provenance/, they can use zstash extract to get those files. We don't have to worry about adding run scripts to this repo.

Comment on lines 22 to 24
* *The DOE E3SM Coupled Model Version 1: Overview and Evaluation at Standard Resolution* `doi: 10.1029/2018MS001603 <https://doi.org/10.1029/2018MS001603>_`
* *Description of historical and future projection simulations by the global coupled E3SMv1.0 model as used in CMIP6* `doi:10.5194/gmd-15-3941-2022 <https://doi.org/10.5194/gmd-15-3941-2022>_`
* *The DOE E3SM Coupled Model Version 1: Description and Results at High Resolution* `doi:10.1029/2019MS001870 <https://doi.org/10.1029/2019MS001870>`_
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get links to show up correctly

@@ -37,22 +37,25 @@ def get_data_size_and_hpss(hpss_path: str) -> Tuple[str, str]:
hpss = ""
return (data_size, hpss)

def get_esgf(source_id: str, model_version: str, experiment: str, ensemble_num: str, cmip_only: str, node: str) -> str:
def get_esgf(source_id: str, model_version: str, experiment: str, ensemble_num: str, link_type: str, node: str) -> str:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sasha might know about what the native links should be. Tony may know as well.

We may not be able to find the links.

Copy link
Collaborator Author

@forsyth2 forsyth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this. Notes below (and in comments linked to this review).

Preview is at https://portal.nersc.gov/cfs/e3sm/forsyth/data_docs_59/html/v1/WaterCycle/index.html. Once merged, the updates will be visible at https://docs.e3sm.org/e3sm_data_docs/_build/html/v1/WaterCycle/index.html.

This PR resolves this epic:

  1. Centralize v1 data on HPSS archive
  • That is, moving data from modelers' home directories to /home/projects/e3sm/www/WaterCycle/E3SMv1/.
  • This is complete except for theta.20190910.branch_noCNT.n825def.unc06.A_WCYCL1950S_CMIP6_HR.ne120_oRRS18v3_ICG (see comment).
  • @TonyB9000 This is specifically about the "Zstash-archived simulation output", not the "Formerly-published E3SM datasets previously accessible vi ESGF links to /p/user_pub" you mentioned in an email.
  1. Add v1 documentation page to e3sm_data_docs
  • These are the index.rst pages added in this PR, describing the simulations.
  1. Update ESGF links for native output
  • This is being out-of-scoped from this PR, since the v1 data is no longer directly available on ESGF.
  • @TonyB9000 for reference, the idea would have been to have ESGF links similar to those on v2, where we have a CMIP & Native links:
https://esgf-node.llnl.gov/search/cmip6/?source_id=E3SM-2-0&experiment_id=piControl&variant_label=r1i1p1f1
https://esgf-node.llnl.gov/search/e3sm/?model_version=2_0&experiment=piControl&ensemble_member=ens1

So, remaining action items after merging this PR:

  • Move theta.20190910.branch_noCNT.n825def.unc06.A_WCYCL1950S_CMIP6_HR.ne120_oRRS18v3_ICG to centralized location.
  • Determine some way to show ESGF links for v1 data.

E3SMv1 (Water Cycle)
====================

The `E3SM version 1 water cycle simulation campaign <https://e3sm.org/research/water-cycle/v1-water-cycle/>`_ includes standard set of
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"website is undergoing maintenance"

with ocean and sea ice grid of 60 km in the mid-latitudes and 30 km at the equator and poles,
and river transport at 55 km horizontal resolution.
This model configuration is described in
`“v1 1 deg CMIP” <https://e3sm.org/model/scientifically-validated-configurations/v1-configurations/v1-1-deg-cmip6/?preview=true>`_ page
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"website is undergoing maintenance"

and river transport at 55 km horizontal resolution.
This model configuration is described in
`“v1 1 deg CMIP” <https://e3sm.org/model/scientifically-validated-configurations/v1-configurations/v1-1-deg-cmip6/?preview=true>`_ page
in `Scientifically Validated Configurations <https://e3sm.org/model/scientifically-validated-configurations/>`_.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"website is undergoing maintenance"

in `Scientifically Validated Configurations <https://e3sm.org/model/scientifically-validated-configurations/>`_.

For more details,
refer to `Coupled E3SM v1 Model Overview <https://e3sm.org/?p=5470>`_ or to the reference papers:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"website is undergoing maintenance"


Scripts are not available to reproduce v1 simulations.

Some original run scripts (the scripts that were originally used to create the simulations) have been archived here `here <https://github.com/E3SM-Project/e3sm_data_docs/tree/main/run_scripts/v1/original/>`_. If a script is not collected here, you can try looking for the provenance run script with:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link will work once this PR is merged.

v1, WaterCycle, LR, Projection, 20191019.DECKv1b_P3_SSP5-8.5-GHG.ne30_oEC.cori-knl, cori-knl, , damip_ssp5-8.5-GHG, 3, none, ,
v1, WaterCycle, HR, Control Runs, theta.20180906.branch_noCNT.A_WCYCL1950S_CMIP6_HR.ne120_oRRS18v3_ICG, theta, , 1950S, 1, none, ,
v1, WaterCycle, HR, Control Runs, theta.20190910.branch_noCNT.n438b.unc03.A_WCYCL1950S_CMIP6_HR.ne120_oRRS18v3_ICG, theta, , 1950S, 2, none, ,
v1, WaterCycle, HR, Control Runs, theta.20190910.branch_noCNT.n825def.unc06.A_WCYCL1950S_CMIP6_HR.ne120_oRRS18v3_ICG, theta, , 1950S, 3, none, ,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be moved to the centralized location:

hsi
mv /home/n/ndk/2019/theta.20190910.branch_noCNT.n825def.unc06.A_WCYCL1950S_CMIP6_HR.ne120_oRRS18v3_ICG /home/projects/e3sm/www/WaterCycle/E3SMv1/HR/theta.20190910.branch_noCNT.n825def.unc06.A_WCYCL1950S_CMIP6_HR.ne120_oRRS18v3_ICG

mv must be used because cp, by back-of-the-envelope calculation, would take over 50 hours. mv can only done be the file owner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndkeen I don't actually need a full review here, I just need you to do the above. Thanks!

@forsyth2 forsyth2 merged commit e0190d3 into main Jun 30, 2025
1 check passed
@forsyth2 forsyth2 deleted the issue-43-2025 branch June 30, 2025 22:48
@TonyB9000
Copy link
Collaborator

@forsyth2 This looks great Ryan. Re-establishing ESGF access to the "formerly published" v1 sets is something we might engineer - depending upon demand and priorities. Had I known there was to be an interest in maintaining ESGF access, I think we might have engineered a different archive format (maybe, "per-dataset" tar-files, etc).

@rljacob rljacob requested a review from ndkeen July 1, 2025 15:50
@chengzhuzhang chengzhuzhang self-requested a review July 8, 2025 21:37
@chengzhuzhang
Copy link
Collaborator

@forsyth2 This looks great Ryan. Re-establishing ESGF access to the "formerly published" v1 sets is something we might engineer - depending upon demand and priorities. Had I known there was to be an interest in maintaining ESGF access, I think we might have engineered a different archive format (maybe, "per-dataset" tar-files, etc).

@forsyth2 thanks for working on this. For the ESGF links, what we should include is the CMIP formated v1 simulations. example: 20180129.DECKv1b_piControl.ne30_oEC.edison has ESGF url: https://esgf-node.ornl.gov/search?project=CMIP6&activeFacets=%7B%22institution_id%22%3A%22E3SM-Project%22%2C%22source_id%22%3A%22E3SM-1-0%22%2C%22experiment_id%22%3A%22piControl%22%7D

Also, I think we should add v1 Large Ensemble as well.

Both can be added in a new PR.

@forsyth2
Copy link
Collaborator Author

For the ESGF links, what we should include is the CMIP formated v1 simulations

#60

Also, I think we should add v1 Large Ensemble as well.

I thought we weren't planning to add them, which is why I didn't include them initially. But yes, we can certainly add those in as well. I sent a support ticket into NERSC with you cc'd for guidance on copying the data. (Let me know if we're happy with the /home/projects/e3sm/www/publication-archives/ paths and don't need to copy to /home/projects/e3sm/www/WaterCycle/E3SMv1/LR/ though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Files in `docs` modified

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add v1 data

6 participants